March 8, 2006
This document provides a quick overview of the OSES crawler plug-in creation process.
PATH
on Windows, LD_LIBRARY_PATH
on UNIX) must contain the path to it. Make sure that Oracle is started
from this environment. As the Oracle process spawns the crawler, it
automatically inherits all environment variables from Oracle, including
the library path.
|
Interface Summary |
|
|
CrawlerPlugin |
Implemented by plug-in writer. crawl() method is the heart of the plug-in. key method: crawl( ) |
|
CrawlerPluginManager |
Implemented by plug-in writer. Responsible for plug-in registration and materializing plug-in instance to be used by the crawler. key method: init(), getCrawlerPlugin(), getPluginParameters() |
|
CrawlingThreadService |
Entry point for submitting document to the crawler. key method: submitForProcessing(DocumentContainer target) |
|
DataSourceService |
An optional service used for managing data source key method: delete(url), indexNow(), registerGlobalLOV() |
|
DocumentAcl |
Object for holding document access control principal. Save it to DocumentMetadata object. key method: addPrincipal(), addDenyPrincipal() |
|
DocumentContainer |
A document “holder” for the document. Note metadata and document status must be set in order to submit the document. key method: setMetadata(DocumentMetadata), setDocument(InputStream), setDocument(Reader), setDocumentStatus() |
|
DocumentMetadata |
Object for storing document metadata and access control information. key method: setACLInfo(DocumentAcl), setAttribute(), setContentType(), setSourceHierarchy() |
|
GeneralService |
Entry point for getting DataSourceService, QueueService, and LoggingService. Factory for creating DocumentAcl, DocumentMetadata, DocumentContainer, and LovInfo object. |
|
Logger |
Logging interface to output message to the crawler log file. key method: error(), fatal(), info(), warn() |
|
LovInfo |
Object for holding search attribute list of values key method: addAttributeValue(name, value) |
|
ParameterValues |
An interface for the plug-in to read the value of data source parameter. |
|
QueueService |
An optional service for storing pending document URLs. key method: enqueue(), getNextItem() |
|
Class Summary |
|
|
ParameterInfo |
ParameterInfo is a class for describing the general
properties of a parameter. PluginManager returns a list of ParameterInfo
through getPluginParameters(). |
|
Exception Summary |
|
|
PluginException |
An exception thrown by the plug-in to report error. This
will shut down the crawler if isFatalException() is true. |
|
ProcessingException |
Exception thrown by the crawler to the plug-in to
indicates trouble processing plug-in’s request. If this is a fatal error the
crawler will try to shut down. Otherwise it’s up to the plug-in to continue
to the next document or not. |